Temporal Models and Smoothing
Data are often observed in time, and time dependence is often expected.
Note: We can use the same model to smooth covariate effects!
Smoothing of the time effect
Prediction
We can “predict” any unobserved data, does not have to be in the future
Time can be indexed over a
Discrete domain (e.g., years)
Continuous domain
Time can be indexed over a
Discrete domain (e.g., years)
Main models: RW1, RW2 and AR1
Note: RW1 and RW2 are also used for smoothing covariates
Continuous domain
Goal we want understand the pattern and predict into the future
Random walk models encourage the mean of the linear predictor to vary gradually over time.
They do this by assuming that, on average, the time effect at each point is the mean of the effect at the neighboring points.
Random Walk of order 1 (RW1) we take the two nearest neighbors
Random Walk of order 2 (RW2) we take the four nearest neighbors
Idea: \(\longrightarrow\ u_t = \text{mean}(u_{t-1} , u_{t+1}) + \text{Gaussian error with precision } \tau\)
Definition
\[ \pi(\mathbf{u} \mid \tau) \propto \exp\!\left( -\frac{\tau}{2} \sum_{t=1}^{T-1} (u_{t+1} - u_t)^2 \right) = \exp\!\left(-\tfrac{1}{2} \, \mathbf{u}^{\top} \mathbf{Q}\ \mathbf{u}\right) \]
\[ \mathbf{Q} = \tau \begin{bmatrix} 1 & -1 & & & & \\ -1 & 2 & -1 & & & \\ & & \ddots & \ddots & \ddots & \\ & & & -1 & 2 & -1 \\ & & & & -1 & 1 \end{bmatrix} \]
Idea: \(\longrightarrow\ u_t = \text{mean}(u_{t-1} , u_{t+1}) + \text{Gaussian error with precision } \tau\)
Definition
\[ \pi(\mathbf{u} \mid \tau) \propto \exp\!\left( -\frac{\tau}{2} \sum_{t=1}^{T-1} (u_{t+1} - u_t)^2 \right) = \exp\!\left(-\tfrac{1}{2} \, \mathbf{u}^{\top} \mathbf{Q}\ \mathbf{u}\right) \]
\(\tau\) says how much \(u_t\) can vary around its mean
We need to set a prior distribution for \(\tau\).
A common option is the so called PC-priors
inlabru for many model parametersThey are build with two principle in mind:
A line is the base model
We want to penalize more complex models
PC prior are easily available in inlabru for many model parameters
They are build with two principle in mind:
\[ \begin{eqnarray} \sigma = \sqrt{1/\tau} \\ \text{Prob}(\sigma>U) = \alpha;\\ \qquad U>0, \ \alpha \in (0,1) \end{eqnarray} \]
\(U\) an upper limit for the standard deviation and \(\alpha\) a small probability.
\(U\) a likely value for the standard deviation and \(\alpha=0.5\).
The Model
\[ \begin{aligned} y_i|\eta_i, \sigma^2 & \sim \mathcal{N}(\eta_i,\sigma^2)\\ \eta_i & = \beta_0 + f(t_i)\\ f(t_1),f(t_2),\dots,f(t_n) &\sim \text{RW2}(\tau) \end{aligned} \]
RW1 defines differences, not absolute levels:
Only the changes between neighboring terms are modeled.
Mathematically, \[ (u_1,\dots,u_n)\text{ and }(u_1+a,\dots,u_n+a) \] produce identical likelihoods — they’re indistinguishable.
This means:
Solution:
[1] "FIT1 - Intercept"
mean sd 0.025quant 0.5quant 0.975quant mode kld
Intercept 174.138 0.002 174.135 174.138 174.142 174.138 0
[1] "FIT2 - Intercept"
mean sd 0.025quant 0.5quant 0.975quant mode kld
Intercept 0 31.623 -62.009 0 62.009 0 0
[1] "FIT1 - RW1 effect"
ID mean sd 0.025quant 0.5quant 0.975quant mode kld
1 1918 -0.122 0.015 -0.152 -0.122 -0.090 -0.122 0
2 1919 0.039 0.015 0.005 0.040 0.068 0.042 0
[1] "FIT2 - RW1 effect"
ID mean sd 0.025quant 0.5quant 0.975quant mode kld
1 1918 174.017 31.623 112.008 174.017 236.025 174.017 0
2 1919 174.177 31.623 112.168 174.177 236.185 174.177 0
\[ u_t = \text{mean}(u_{t-2} ,u_{t-1} , u_{t+1}, u_{t+2} ) + \text{some Gaussian error with precision } \tau \]
RW2 are smoother than RW1
The precision has the same role as for RW1
cmp1 = ~ Intercept(1) +
time(year, model = "rw1",
scale.model = T,
hyper = list(prec =
list(prior = "pc.prec",
param = c(0.3,0.5))))
cmp2 = ~ Intercept(1) +
time(year, model = "rw2",
scale.model = T,
hyper = list(prec =
list(prior = "pc.prec",
param = c(0.3,0.5))))
lik = bru_obs(formula = Erie~ .,
data = lakes)
fit1 = bru(cmp1, lik)
fit2 = bru(cmp2, lik)NOTE: the scale.model = TRUE option scales the \(\mathbf{Q}\) matrix so the precision parameter has the same interpretation in both models.
RW models are discrete models
Covariates are often recorded as continuous values
The function inla.group() will bin covariate values into groups (default 25 groups)
inla.group(x, n = 25, method = c("cut", "quantile"))
Two ways to bin
cut (default) splits the data using equal length intervalsquantile uses equidistant quantiles in the probability space.The data are derived from an anthropometric study of 892 females under 50 years in three Gambian villages in West Africa.
Age - Age of respondent (continuous)
triceps - Triceps skinfold thickness.
Latent effects suitable for smoothing and modeling temporal data.
One hyperparameter: the precision \(\tau\)
It is an intrinsic model
The precision matrix \(\mathbf{Q}\) is rank deficient
A sum-to-zero constraint is added to make the model identifiable!
RW2 models are smoother than RW1
Definition
\[ u_t = \phi u_{t-i} + \epsilon_t; \qquad \phi\in(-1,1), \ \epsilon_t\sim\mathcal{N}(0,\tau^{-1}) \]
\[ \pi(\mathbf{u}|\tau)\propto\exp\left(-\frac{\tau}{2}\mathbf{u}^T\mathbf{Q}\mathbf{u}\right) \]
with
\[ \mathbf{Q} = \begin{bmatrix} 1 & -\phi & & & & \\ -\phi & (1+\phi^2) & -\phi & & & \\ & & \ddots & \ddots & \ddots & \\ & & & -\phi & (1+\phi^2) & -\phi \\ & & & & -\phi & 1 \end{bmatrix} \]
The AR1 model has two parameters
The AR1 model has two parameters
The precision \(\tau\)
pc.prec \[
\text{Prob}(\sigma > u) = \alpha
\]The autocorrelation (or persistence) parameter $(-1,1)
pc.cor0\[ \begin{eqnarray} \text{Prob}(|\rho| > u) = \alpha;\\ -1<u<1;\ 0<\alpha<1 \end{eqnarray} \]
The AR1 model has two parameters
The precision \(\tau\)
pc.prec \[
\text{Prob}(\sigma > u) = \alpha
\]The autocorrelation (or persistence) parameter $(-1,1)
pc.cor1\[ \begin{eqnarray} \text{Prob}(\rho > u) = \alpha;&\\ -1<u<1;\qquad &\sqrt{\frac{1-u}{2}}<\alpha<1 \end{eqnarray} \]
The Model
\[ \begin{aligned} y_t|\eta_t & \sim \text{Poisson}(\exp(\eta_t))\\ \eta_t & = \beta_0 + u_t\\ 1.\ u_t&\sim \text{RW}1(\tau)\\ 2.\ u_t&\sim \text{AR}1(\tau, \phi)\\ \end{aligned} \]
Number of serious earthquakes per year
hyper = list(prec = list(prior = "pc.prec", param = c(1,0.5)))
cmp1 = ~ Intercept(1) + time(year, model = "rw1", scale.model = T,
hyper = hyper)
cmp2 = ~ Intercept(1) + time(year, model = "ar1",
hyper = hyper)
lik = bru_obs(formula = quakes ~ .,
family = "poisson",
data = df)
fit1 = bru(cmp1, lik)
fit2 = bru(cmp2, lik)Estimated trend
Predictions